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Abstract 

We present an approach for the static analysis of programs handling 
arrays, with a Galois connection between the semantics of the array pro¬ 
gram and semantics of purely scalar operations. The simplest way to 
implement it is by automatic, syntactic transformation of the array pro¬ 
gram into a scalar program followed analysis of the scalar program with 
any static analysis technique (abstract interpretation, acceleration, predi¬ 
cate abstraction,. ..). The scalars invariants thus obtained are translated 
back onto the original program as universally quantified array invariants. 
We illustrate our approach on a variety of examples, leading to the “Dutch 
flag” algorithm. 


1 Introduction 

Static analysis aims at automatically discovering program properties. Tradi¬ 
tionally, it has focused on dataflow properties (e.g. “can this pointer be null?”), 
then on numerical properties (e.g. “2a;-ft/ < 45 at every iteration of this loop”). 
When it comes to programs operating over arrays, special challenges arise. For 
instance, the Astree static analyzer^ based on abstract interpretation and 
commercially used in the avionics, automotive and other industries, supports 
arrays simplistically: it either “smashes” all cells in a single array into a single 
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abstract value, or expands an array of n cells into n variables; in many cases it 
is necessary to fully unroll loops operating over an array in order to prove the 
desired propertj0. 

In general, however, analyzing arrays programs entails exhibiting inductive 
loop invariants with universal quantification over array indices. Neither smash¬ 
ing nor expansioncan prove, in general, that a simple initialization loop truly 
does work: 


Listing 1: Simple array initialization 
int t[n]; for(int i=0; i<n; i++) t[i] = 0; 

To derive the postcondition Vfc.O < fc < n —> t[k] = 0, one uses the loop invariant 
(in the Floyd-Hoare sense) 0 < i < n A Vfc.O < k < i ^ t[k] = 0. The 0 < z < n 
part (or generalizations, e.g., filling the upper triangular part of a matrix) can 
be automatically inferred by many existing numeric analysis techniques. In 
contrast, the Vfc.O < k < i ^ t[k] = 0 part is trickier and is the focus of this 
article. 


Contribution We propose a generic method for analyzing array programs, 
which can be implemented i) as a normal abstract domain ii) or by translat¬ 
ing the program with arrays into a scalar program (a program without arrays), 
analyzing this program by any method producing invariants (back-end), and 
then recovering the array properties. Its precision depends on the back-end 
analysis. Our method has tunable precision and is formalized by Galois connec¬ 
tions [1^ and, contrary to most others, is not guided by a target property (here 


Vfc.O < fc < n —>• t[k] = 0), though it can take advantage of it. It can therefore 
be used to supply information to the end-user “what does this program do?” as 
opposed to be useful only for proving properties. We demonstrate the flexibil¬ 
ity of our approach on examples, using the acceleration procedure Plata, the 
abstract interpreter ConcurInterproc and CPAChecker as back-ends. 

We also show a form of completeness: for any loop-free program, the preci¬ 
sion of the analysis can be chosen so that it is exact with respect to universally 
quantified array properties l H4.3p . 

Our approach also applies to general maps keys values, though certain 
optimizations apply only to totally ordered index types. 


Contents [Section ^ introduces our approach on one example. [Section 3l dis- 
cusses the Galois connections, and [section ^ gives the formal definition of our 
transformation algorithm and associated correctness and partial completeness 
proofs. [Section 5l discusses the use of various backends on more examples. We 
finish with [related work[ and [conclusionl 

^Possible since Astree targets safety-critical embedded systems where array sizes are typ¬ 
ically fixed at system design and dynamic memory allocation is prohibited. 
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2 Example: the Sentinel 

Our program transformation consists in i) a replacement of reads and writes 
parameterized by a number of distinguished indices, formalized in [section ^ 
ii) optionally, some “focusing” on a subset of index values iii) for certain back¬ 
ends (ConcurInterproc), the addition of observer variables implementing a 
form of partitioning. 

Listing 2: A “sentinel value” marks the penultimate array cell 
const int N=1000; int i = 0, t [N]; 
ini t ia liz e (N, t); t [N—2] = —1; 
while (t[i]>=0) i++; 

Obviously to us humans, this program cannot crash with an array access out of 
bounds, and the final value of i is, at most, 998 (its value depends on how the 
“ initialize ” procedure works). How can we obtain this result automatically? 

Let X be a symbolic constant in {0 ,... ,N — 1}. We abstract array t by 
the single cell t[x], represented by variable tx: reads and writes at position x 
in t translates to reads and writes to variable tx and reads and writes at other 
positions are ignored. Program [5] is thus abstracted as0 

const int N=1000, x = random)); assume(x >= 0 && x < N); 

int i = 0, tx = random)) ; if )N—2 == x) tx = —1; 

while)l) { int read = random)); if )i == x) { read = tx; } 

if )read < 0) { break; } i = i+1; } 

Plata Bin can compute an exact input/output relation of this program 
(to demonstrate generality, we left N unfixed and replaced N-2 by a parameter 
p; we thus use a precondition 0 < x < N A 0 < p < N): 

{p — xAi<x— lAi>OAN>x-\-l)\/{i — xAi>OAN>p-\-lAi<p — 1)V 
(ai>p+lA'i<ai— lA2>0AA^>a;+lAp>0)V(i — aiA'i<A^— lAz>p+lAp> 0)V 
(■i>a: + lAA^>p+lA'i<A^ — lAa:<p— lAa:> 0)V 
(i<x — lAi>OAN'>p-\-lAx<p — 1)V 

{i — xAi — pAi'>0Ai'^N — l)\/{x'>p-\-lAi>x-\-lAi<N — lAp>0) (E) 

Note that our abstraction is valid whatever the value of x. This means that 

{i,p, N) should be a solution of > 0 A Va:: {0 < x < N ^ F). One can check 

that this quantified formula entails i < p. 

Arguably, we have done too much work: the only cell in the array whose 
content matters much is at index p (N—2 in the original program). Running 
Plata with x = p yields a postcondition implying i < p. Again, this is sound, 
because any choice of x yields a valid postcondition on {i,p). 


3 Galois connections 

7 

We shall now see that, for any choice of indices, there is a Galois connection <..'> 

0 ’ ’ a ' 

between the concrete (the set of possible values of the vector of variables of 

®We have left out, for the sake of brevity, tests for array accesses out of bounds. 
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the original program) and the abstract set of states (the set of possible values 
of the vector of variables in the transformed program). In general, this Galois 
connection is not onto: there are abstract elements that include “spurious” 
states, and which may be reduced to a strictly smaller a o 7 ( 0 ;^). 

If A and B are sets, A ^ B denotes the set of total functions from A to 
B, and V (A) the set of parts of A. If A is finite, A ^ B denotes the set 
of arrays indexed by A; specifically, if A is {I, ... ,li} x ■ ■ ■ x {I,..., l^} then 
A ^ B denotes the d-dimensional arrays of size (/i,..., Id)- f[x] denotes the 
application f{x) where / is a program array or map. 

Our constructions easily generalize to arbitrary combinations of numbers of 
arrays and numbers of indices; let us see a few common cases. 


3.1 Single index 


Applied with a single index, our map abstraction is classical 1^, § 2 . 1 ]. 


Definition Let / € A —>■ B, we abstract it by its graph ai{f) = {(a,/[o]) | 
a S A}; e.g., a constant array {1,... ,n} —Z with value 42 is abstracted as 
{(*, 42) I 1 < f < n}. 

We lift Q!i (while keeping the same notation) to a function from V {A ^ B) 
to B (A X B): for B'’ C A —)• B, ai(B^) = U/eF'^i(/)’ otherwise said 

ai(B^) = {(a,/[a]) |aGA,/GB'} (1) 

Let F'^ C a X B. Then we define its concretization 71 (B^): 

7i(B^) = {/GA^B|VaGA (a,/[a]) G B^} (2) 

7i 

It is easy to see that {V [A ^ B), C) < ^ V {Ax B) is a. Galois connection. 

Non-surjectivity and reduction Remark that ai is not onto (if |A| > 1 
and |B| > 0): there exist multiple B^ such that 71 (B^) = 0, namely all those 
such that 3o G A\/b G B (a, 6 ) ^ FK For instance, if considering arrays of two 
integer elements (A = {0,1}, B = Z), then B^ = {(1,0)} yields 71 (B^) = 0: 
there is no way to fill the array at index 0. 

Let us now see the practical implication. Assume that the program has a 
single array in A —>■ B and a vector of scalar variables ranging in S', then the 
memory state is an element of = S x {A ^ B). The scalar variables are 
combined into our abstraction as follows: 

V{S X {A^ B)) S ^P{A^ B) S ^P{Ax B)'^P{S X Ax B) = X\ 

(3) 

where of and yf lift ai and 71 pointwise. Let s G S. While the absence of 
any {s,a,b) G x'^ {x^ G Af^) indicates that there is no (s, /) G 7f(a;^), that is, 
scalar state s is unreachable, the converse is not true. Gonsider a single integer 
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scalar variable s and an array a of length 2 , and = {( 0 , 0 , 1 ), ( 1 , 0 , 0 ), ( 1 , 1 , 2 )}, 
representing the triples (s,z,a[i]). It would seem that s = 0 is reachable, but it 
is not, because there is no way to fill the array at position 1 ; there is no element 
in of the form ( 0 , 1 , 6 ). 

A reduction is a function p : —>■ such that 70 ^ = 7 and p{x^) C x^ for 
all x^. The strongest reduction popt (the minimum for the pointwise ordering 
induced by C) is a o 7 . In the above, popt(a;^) = {(1, 0,0), (1,1, 2)}; intuitively, 
the strongest reduction discards all superfluous elements from the abstract value. 

Class of formulas Assume now that the vector of scalar variables si,..., Sm 
lies within S = Z™, the index a lies in { 1 ,..., li} x • • • x { 1 ,..., Id}, and the 
values f[a] also lie in Z. Consider a formula ip of the form 

Vai, ...,ad (pisi , . . . , Smt ai,...,ad,f[ai,...,ad\) (4) 

where is a first-order arithmetic formula (say, Presburger). 

Then, f \= ip if and only if af(/) C {((si,..., s™), («!, •. •, Od), 6 ) | (pisi,, 

...,, Sm, o-d, 6 )}. The sets of program states expressible by formulas of 

form|4]thus map through the Galois connection to a sub-lattice of V (Z™ x Z'^ x Z). 
This construction may be generalized to any theory or combination of theories 
over the sorts used for scalar variables, array indices, and array contents. 

Checking that an invariant yf (G) entails ip, when the set G is defined by a 
formula P, just amounts to checking that P A -<ip is unsatisfiable. 

3.2 Several indices, one per array 

The above settings can be extended to several arrays. Let /, p G A —)> S, we 
abstract them by the product of their graphs ai{f,g) = {{a, f[a], a', g[a']) \ 
a, a' G A], 71 (x^) = {(/,p) G (A -)> | Va, a' G A {a, f[a],a', g[a']) G x^}. 

This abstraction can express properties of the form 

Vdl, . . . , CLd, , . . . , Ud 0(si, . . . , Sm , ai, . . . , Ud, , . . . , Cid\, , Ud, p[^i 5 ..., 

As an example, the property that up to index k, monodimensional array / of 
length n has been copied into array g can be expressed as Va, a' G {1,..., n} a < 
k Aa = a' => f[a] = g[a'] within that class. 

3.3 Dual indices, same array 

Definition Let f £ A ^ B, pose Q! 2 (/) = {{a, f[a],a', f[a']) | a, a' G A} and 
lift it to a function from V {A ^ B) foV ((A x Let C (A x B)^. Then 

we define its concretization 72 (F^): 

12{F'^) = {/ G a ^ F I Va,a' G a {a, f[a],a , f[a]) G F^’} (5) 

It is easy to see that {V {A ^ B) ,C) < . ^ . > V {Ax B) is a Galois connection. 

If A is totally ordered, it seems a waste to include both {a, f[a],a', f[a']) 
and (a',/[o'], a,/[o]) in the abstraction for a < o'. We thus define a 2 <(/) = 
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{{a, f[a],a', f[a']) \ a < a' & A} and ^ 2 <{x^) = {f € A ^ B \ Va, a' € A ,a < 
a' ^ (a,/[a], o',/[o']) € x^}. 


Non-surjectivity Remark, again, that 02 is not onto. Consider an array of 
integers of length 3, that is, a function / : {1, 2, 3} —> Z. An analysis computes 
its abstraction as = {( 1 , 0, 2 , 0), ( 1 , 0,3, 0), ( 2 , 0,3, 0), ( 1 , 0,3, 1 )}; recall that 
each element of that set purports to denote (a, /[a], o', f[a']) for a < a'. At first 
sight, it seems that /(3) = 1 is possible, as witnessed by the last element. Yet, 
there is then no way to fill a[2]: there is no x such that (2,x, 3,1) € x^. This 
last element is therefore superfluous, and we can conclude that Vx /[x] = 0. 
(See ? l5.5l for a real-life example.) 

If x*' is defined by a first-order formula (x^ = {(a, 6 , o', 6 ') | (/)(a, &, a', 6 ')}), 
then this reduction (removing all a ', h' such that for some a < a' there is no way 
to fill f[a]) is obtained as: Va36 a < a' ^ (j){a, b, a', b'). 

Class of formulas Assume now that the vector of scalar variables si,..., Sm 
lies within S = Z’", the indices a < a' lie in {1,..., n}, and the values f[a],f[a'] 
also lie in Z. Consider a formula ip of the form Va, a' a < a' => (pisi, ■ ■ ■, Sm, a, 
f[a],a'^ f[a']) where <)) is a first-order arithmetic formula (say, Presburger). For 
instance, one may express sortedness: Va, a' a < a' ^ f[a] < f[a']. 

Then, f \= ip ii and only if a§^{f) G {((si, a, 6 , a', Y) | (p{si,..., 

Sm, a, b, a', b')}. The sets of program states expressible by formulas of the form 
Va,a' a < a' ^ (p(si,..., Sm,a, f[a],a', f[a']) thus map through the Galois 
connection to a sub-lattice of V (Z™ x (Z x Z)^). 

4 Abstraction of program semantics 

Our analysis may be implemented by a syntactic transformation of array oper¬ 
ations into purely scalar operations. In this section, for each operation (read, 
write) we describe the transformed operation and demonstrate the correctness 
of the transformation. We then discuss precision. 

Without loss of generality, we consider only elementary reads and writes (r= 
f[i]; and f[i]=r; with i a variable). More complex constructs, e.g. f[e]=r; with 
e an expression, can always be decomposed into a sequence of scalar operations 
and elementary read and writes, using temporary variables. 

4.1 Transformation and Correctness 

Reading from the array Consider a program state composed of {s,r,i,f) 
where r G B, i G A are scalars, s G S' is the rest of the state, and f G A ^ B. 
Consider the instruction r=f[i];, its semantics is: 

(s, r, i, /) (s, f{i),i, f) ( 6 ) 

We wish to abstract it by the program fragment: 
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Listing 3: Read from array 
r = randomO; if (i==a) { r=b; } 

Lemma 1. The forward and backward semantics of Program abstract the 
forward and backwards semantics ofr=f[i]; by the (af, 7 f) Galois connection. 

More generally, a read with several indexes Oi, 02 ,... is abstracted by 
r=random() ; if (i==ai) assume ( r==&i); if (i==a 2 ) assume ( r == 62 ); ••• 
The same lemma and proof carry to that setting. 

Writing to the array Consider the instruction f[i]=r;, its semantics is: 

{s,r,i,f) {s,r,i,f[i ^ r]) (7) 

We wish to abstract it by the program fragment: 

Listing 4: Write to array 

if (i=a) { b=r; } 

Lemma 2. The forward and backward semantics of Program abstract the 
forward and backwards semantics off[i]=r; by the (af, 7 f) Galois connection. 

The same carries over to writing to an array with several indices, abstracted as: 

Listing 5: Write to array, multiple indexes 
if (i==al) { bl=r; } if (i==a2) { b2=r; } ... 

Operations on scalars Consider a program state composed of (s, /) where 
f € A ^ B is a.n array and s S 5” is the rest of the state. Consider a 

P 

scalar instruction s —> s' and thus (s,/) —S- (s',/). We abstract P as: 

pii 

{s,a,b) —(s', a, 6 ) if s —>■ Ps'. Essentially, operations on scalars are ab¬ 
stracted by themselves. The following result generalizes immediately to ( 02 , 72 ) 
etc. 

Lemma 3. The forward and backward semantics of —> abstract those of —> 
by the Galois connection. 

4.2 Precision loss 

“Forgetting” the value of a scalar variable v corresponds to (s, u,/) —>■ (s,/). 
This scalar operation may be correctly abstracted, as in lLemma 31 by (s, v, a, b) —>■ 
(s, a, b). Surprisingly, applying this operation not only forgets the value of v, it 
may also enlarge the set of represented /. 

Example: = {(0,u,a,u) | a G AAv G B} abstracts by (af,yf) the set of 

triples (0, v, f) where / is a constant function of value v. Forgetting v yields the 
set of pairs (0, /) where / is a constant function. Applying (s, v, a, b) -A (s, a, b) 
to yields = {(0, a,-!;) \ A G A A v G B}, which concretizes to the set 
{{0, f) \ f G A ^ B}. We have completely lost the “constantness” property. 


7 





4.3 Relative completeness 

We now consider the problem of completeness of this abstraction, assuming that 
the back-end analysis is perfectly precise (thus relative completeness). 

Our analysis is incomplete in general. Consider the following program: 

Listing 6 : Fill with zero, test zero 
int t[N]; for(int i=0; i^; i++) t[i] = 0; 
for(int i= 0 ; i<dM; i++) if (t[i]! = 0 ) break; 

In the second loop, the break statement is never reached and thus at the end 
of the loop, i = N. Yet, if we distinguish n < N different indices ii,... ,in, 
we cannot prove that this statement is never reached: for there will exist i € 
{0,..., IV — ..., such that t[i] returns, in the abstracted program, an 

arbitrary value and thus the break statement is considered possibly reachable. 

In contrast, when the program is loop-free, the abstraction is exact with 
respect to the scalar variables, provided the number of indices used for the 
abstraction is at least the number of array accesses: 

Theorem 1. Consider a loop-free array program P with arrays ai,..., such 
that the number of accesses to these arrays are respectively oi,... ,ad- By ab¬ 
stracting these arrays with, respectively, ni,..., indices such that Ui > on for 
all i, we obtain a Galois connection such that 715 o y o o a = tts o P*' 

where tts is the projection of the state to the scalar variables. 

This completeness results extends to universally quantified array properties 
Vzi,... P(ji, ...)—?> Q{ai[ii],...): one appends to the original program (as¬ 
suming ii,... ,in are fresh, nondeterministically initialized): 

assume ((P(ii, •■•)); assert (Q(ii,...)); 


5 More examples 

5.1 Matrix initialization 

Listing 7: Initialization of m x n matrix a with value v 

void array _init_ 2 d (int m, int n, int a[m][n], int v) { 
for(int i = 0 ; i < m; i++) { 

forfint i = 0; j < n; j ++) a[i][i] = v; } } 

Again, we consider cell a[x,y], where 0 < x < m and 0 < y < n, and 
disregard all other cells. One should not convert this procedure into a single 
control-flow graph, because the resulting numerical transition system does not 
have the “flat” structure expected by Plata . Instead, one must encode the 
inner loop as a separate procedure: 
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void array_init_2d ( int m, int n, int a, int v, int x, int y) { 
assume (x >= 0 && x < m) ; 
assume (y >= 0 && y < n) ; 

for(int i=0; i<jn; i++) innerloop(n, a, v, x, y, i); } 

void inner.loop( int n, int a, int v, int x, int y, int i) { 
for(int j=0; i<n; j++) if (x==i && y==j) a = v; } 

Flata then computes the exact input-output relation of innerJoop, and finally 
the exact input-output relation of arrayJnit_2d: 

{x = OAm = lAfl' = v/\y > OAn > y+l)\/{a = vAx > lAy > OAm > x+lAn > y+l)V 
(n = lAx = OAy = OAa' = vAm > 2)v(x = OAa' = vAy > OAm > 2An > 2An > y+1) 

Each disjunct implies a' = v, i.e., the final value of a[x,y] is v. Again, because 
{x, y) are symbolic constants with no assumption except that they are valid 
indices for a, this proves that all cells contain v. Assuming 0<x<mA0< 
y < n this formula may indeed be simplified automatically into a' = 

5.2 Slice initialization 

Listing 8: Initialize a[low ... high — 1] to n 

void s li c e _in i t (int n, int a[n], int low, int high, int v) { 
for(int i=low; i<high; i-t-t) a[i] = v; } 

Again, we transform the program using a single index: 

for(int i=low; i<high; i-t-t) if (x == i) a = v; 

Flata produces as postcondition (assuming 0<a:<nA0< low < high < n): 

{high = low Aa = a A high > 0 An > high An>x + lAx> 0)V 

{a = V A low < X An > high A high > a: -|- 1 A low > 0)V 
{a = a An > high A high > low -I- 1 A low > x + 1 A x > 0)V 

{a = a A high <xAn>x + lA high > low -I- 1 A low > 0) (8) 


Again, under the assumptions 0 < x < n and 0 < low < high < n, this 
formula is equivalent to: {{low < x < high) a' = v) A {-<{low < x < high) —t 
a' = a). Thus by quantification, the expected outcome: 

(Vcc e [low, high) a'[x\ = v) A (Va; ^ [low, high) a'[x\ = a[a;]) (9) 

5.3 Array copy 

Listing 9: Copy array a into array b 

void array.copy( int n, int a[n], int b[n]) { 
for(int i=0; i<n; i++) b[i] = a[i]; } 

Take a single cell a[a:] in a and a single cell b[y] in &; after transformation: 

^We implemented a simplification algorithm for quantifier-free Presburger arithmetic in¬ 
spired by 1381 so as to understand the output of Plata and ConcurInterproc. 
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int n, a , h , x , y , tmp; 

assume (0 <= x && x < n && 0 <= y && y < n); 

for(int i=0; i<n; i++) { if (x==i) tmp=a; if (y==i) b=tmp; } 

Plata Plata yields: {y > x + lAn > y + 2 A x > 0)V(n = y + 1 A y > 
x + lAx>0)\/{n = x + lAy>0Ay<x — l)\/{y>0Ay<x — lAn> 
x + 2)V{y = xAb' = a An > x + 2Ax > 0)V(?/ = xAb' = a An = x-\-lAx > 0). 
Assuming Q < x < n AQ <y < n, this is equivalent to x = y ^ a = b. Thus by 
quantification, 'ix,y.x = ?/—>■ a[a;] = 6[j/], simplifiable into Va;.a[a:] = b[x\. 

Software model checking Many software model checkers, including CPACheckee@, 
do not handle universally quantified array properties; yet we can use them as 
back-end analyses! We translate the target property (here Vx.O < x < n ^ 
a[x\ = &[a:]) into a precondition x = y and an assertion on the postcondition 
a = b. CPAChecker then proves the property]^ 

int main () { 

int n, a, b, x, y; 

if (0 <= x && x < n && 0 <= y && y < n && x==y) { 
for(int i=0; i<n; i++) { 

int tmp; if (x==i) tmp=a; if (y==i) b=tmp; } 
assert(a==b); } } 


5.4 In-place array reversal 

Listing 10: Array reversal 

void array_reverse_inplace( int n, contents t[n]) { 
int i=0, i=n-l; 
while (i < j ) { 

contents tmpl = t[i], tmp2 = tlj]; 

t[i] = tmp2; t[j] = tmpl; i+ + ; j-; } } 

For this program, we need to distinguish the initial values in the array from 
the values during the computation (which finally yield the final values). We use 
three indices 0<x<n, 0<y<z<n: a is the initial value of i[x\^ b the 
current value of t[y], c the current value of t[z\. 

For each read, we check if the index of the read is equal to y (respectively, 
z) and return b (respectively, c) if this is the case. If the index is equal to both 
y and z, it is sound to return either b or c; we chose to return b. For each write, 
we test if the index is equal to y, in which case we write to 5, and equal to z, in 
which case we write to c. If it is equal to both y and z, we write to both b and 
c. 


Listing II: Array reversal, transformed 
contents a, b, c; 
int X, y, z, i=0, j=n —1; 

® http: //cpachecker.sosy-lab.org/ 

®scripts/cpa.sh -predicateAnalysis after preprocessing with assert.h 
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if(y==x)b=a; if 
while(i < j) { contents 
if (i == y) tmpl = b; 
if (j == y) tmp2 = b; 
if (i == y) b = tmp2; 
if ( j == y) b = tmpl; 
} 


(z == x) c = a; 
tmpl, tmp2; 
else if ( i == z) tmpl 
else if (j == z) tmp2 
if (i == z) c = tmp2; 
if (j == z) c = tmpl; 


c; 
c; 


i++; j—; 


Flata Plata takes 480 to process this program, and outputs an input- 
output relation (j) in disjunctive normal form with 292 disjuncts (not reprinted). 
The output formula is very complicated, with explicit enumeration of many 
particular cases; the reason for the slowness and the size of the output formula 
seems to be that Plata explicitly enumerates many cases up to saturation, 
with no attempt at intermediate simplifications. We shall now explain what 
this formula entails. 

Let U he 0 < x,y,z < n A y + z = n — 1. Let U<: he U A y < z A z = 
X A y + z = n — 1, then (j) A C/< is equivalent to a = 5 A ?7<. This means 
that under the precondition {7<, Prog. [11] has exact postcondition a = b. By 
universal quantification, this means that \/x,y, z.U^ -A t[x\ = t'[y], where t is 
the input array to Prog. (TUI and t' the output. This formula may be simplified 
into Va;.0 < a; A 2a; < n — 2 —)• t [x] = t'[n—\—x\-, We can obtain similar formulas 
for the cases y > z and y = z. The three cases can can be summarized into 


Va;.0 < a: < n —)• t\x\ —t'[n — 1 — x\ 


( 10 ) 


Flata, focused The above execution time and the complexity of the resulting 
formula seem excessive, if all that matters is when (a: = yMx = z)Ay-\-z = n —1. 
Indeed, some easy static analysis (by Plata or another tool) shows that the 
array accesses within the loop are done at indices i and j that satisfy 0 < i < 
j < n and i+j = n — 1. Such a pre-analysis suggests to target the main analysis 
to two positions t[y] and t[z\ in the current array, satisfying 0 < y < z < n and 
y + z = n—1. The only positions a [a;] that matter in the original array are 
those that can be read precisely, that is, x = y and x = z. 

We therefore re-run the analysis with precondition U: {0<y<z<nAy + 
z = n — lAx = y). Plata runs for 6 s and outputs a formula with 8 disjuncts, 
with a = c in all disjuncts. We thus have proved that \/x,y, z.U -A t[x\ = t'[z], 
which can be simplified into \lz.1z n — 1 A z < n ^ t'[z] — t\n — 1 — z]. 
We may also run with the precondition, (0 < y < z < nAy+z = n—1 Ax = z) 
and get the remainder of the cases to conclude as in Pormula [101 

To summarize, when the exact analysis of the transformed program (that 
is, an exact analysis in the back-end) is too costly, one may choose to focus 
the analysis by restricting the range of the indices {x,y, z,...) to some area 
U considered to be “meaningful”, for instance obtained by pre-analysis of the 
relationships between the indices of the array accesses in the program. This is 
sound, since the quantification in the resulting formula is over the indices satis¬ 
fying U. Thus, a bad choice for U may only result in a sound, but uninteresting 

^All timings using one core of a 2.4 GHz Intel ® Core"'’'^ i3 running 32-bit Linux. 
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invariant (the worst case is to take an unsatisfiable U : we then obtain a formula 
talking about an empty set of positions in the arrays, thus a tautology). 


Concurinterproc, focused Interproc@ applies classical abstract interpre¬ 
tation (Kleene iteration accelerated with widenings, with possible narrowing it¬ 
erations) over a variety of numerical abstract domains provided by the Apron 
librar}@ (intervals, “octagons” 


37| . convex polyhedra [2; 


...). 


ConcurInterprocF^ extends it to concurrency (which we will not use 
here) and partitioning of the state space according to enumerated types, in¬ 
cluding Booleans. In a nutshell, while Interproc assigns a single abstract 
element (product of intervals, octagon, polyhedron) to each program location, 
ConcurInterproc attaches 2" abstract elements, where n is the number of 
Booleans (or,more generally, one per concrete instantiation of the enumerated 
variables). In order to achieve this at reasonable cost, the BDD Apron library 
uses a compact representation, where identical abstract elements are shared and 
the associated set of concrete instantiations is represented by a binary decision 
diagram. 

Program[TT] contains no Boolean variable (or of any other enumerated type), 
thus directly applying ConcurInterproc over it will yield one convex poly¬ 
hedron at the end; yet we need to express a disjunction of such polyhedra (e.g. 
there is the case where x = y, and the case where x ^ y, which may be sub¬ 
divided into X < y and y < z). Furthermore, inside the loop one would have 
to distinguish i < y, i = y, i > y. This is where, in other analysis of array 
properties by abstract interpretation [13, iO, 3 introduces “slices” 
or “segments” of programs, often according to syntactic criteria. In our case, 
we wish to distinguish certain locations in the array (or combinations of several 
locations, as here with three indices x, y, z) according to more semantic criteria. 

Our solution is to introduce observer variables, which are written to but 
never read and whose final value is discarded, but which will guide the analysis 
and the partitioning performed. Here, we choose to have one flag variable per 
access, initially set to “false”, and set to “true” when the access has taken place. 
As previously, we use a precondition y + z = n — lAx = z. 


Listing 12: Array reversal, transformed and instrumented 

contents a, b, c; 
int X, y, z; 

bool yO,zO,yl,zl,y2,z2,y3,z3,y4,z4; 
x0=y0=yl=zl=y2=z2=y3=z3=y4=z4=false; 
int i=0, j=n — 1; 

assume(y+z == n—1); assume ( x==z ); 

if (y == x) { b = a; yO = true; } if {z == x) { c = a; zO = true; } 
while (i < j ) { 

contents tmpl, tmp2; 

if (i == y) {tmpl = b; yl = true;} else if (i == z) (tmpl = c; zl = true;} 

if (j == y) {tmp2 = b; y2 = true;} else if (j == z) {tmp2 = c; z2 = true;} 

if (i == y) {b = tmp2; y3 = true;} if (i == z) {c = tmp2; z3 = true;} 

if (j == y) {b = tmpl; y4 = true;} if (j == z) {c = tmpl; z4 = true;} 

^ http: //pop-art. inrialpes.fr/people/bjeannet/bjeannet-forge/interproc/ 

^ http: //apron.cri.ensmp.fr/library/ 
http: //pop-art. inrialpes.fr/interproc/concurinterprocweb.cgi 
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i++; j—; 


} 


ConcurInterproc, within 0.16 s, concludes that a = b. 

5.5 Dutch national flag 

Quicksort is a divide-and-conquer sorting algorithm: pick a pivot, swap array 
cells until the array is divided into two areas: elements less than the pivot, 
and elements greater than or equal to it; then recurse in both areas. An im¬ 
provement, in case many elements may be identical, is to swap the array into 
three areas: elements less than the pivot, equal to it, and greater than it, and 
recurse in the “less” and “greater” areas. This three-way partition is equivalent 
to the “Dutch national flag problem” [l^ ch. 14], of swapping pebbles of colors 
red, white and blue (corresponding to “less”, “equal” and “greater”) into three 
segments. 

Listing 13: Dutch flaeF^ 

void threeWayPartition ( int data[], int size, int low, int high) 

{ 

int p = —1, q = size; 
for ( int i = 0; i < q;) { 

if(data[i] < low) {swap(&data [ i ] , &data [-i-+p ]) ; ++i;} 

else if ( data [ i]>=high) {swap(&data [ i ] , &data[—q]);} else ++i 


}} 

We transform this program with two indices 0 < x < y < n (remark that this 
is valid only if n > 2) with associated values datax and datay, and instrument 
it with Boolean observer variables: for each read or write access to an index 
i, we keep a Boolean recording the value of predicate x < i and one ior x > i 
(respectively for y). The values in the array are encoded as pebble colors LOW, 
MIDDLE, HIGH. 

ConcurInterproc computes a postcondition within I min. The resulting 
formula (p has 52 cases; we will not print it here. We check that (p A x < 
p -A datax = BLUE, meaning that finally, Va;.0 < a; < p —>■ t\x\ — BLUE 
Similarly, </> A y > q —>• datay = RED, thus Vy.q < y < n ^ t[y] — RED. We 
would expect as well that \/x.p < x < q ^ t[x] = WHITE. Yet, this does not 
immediately follow from cp: cp Ap < y < q A datay = RED is satisfiable! Could 
there be red cells in the supposedly white area? 

Note that (p, for fixed values of n,p, q, encodes quadruples {x, datax, y, datay), 
which encompass all possible values of {x,t[x],y,t[y]) for x < y. In particular, 
for t[y\ — RED to be possible for given n,p, q, one must have suitable t[x] for all 
X < y, such that {x, t[x],y, RED) satisfies (p for the same n,p, q. In other words, 
to have a cell t\y] = RED one must be able to find values t[x\ for all cells to the 
left of it. We check that, indeed, p < y < qAdatay ^ WHITEA(Va:.0 <x<y^ 
<p) is unsatisfiableP^ meaning that V?/.(p <y<qAy>0)^ t[y] = WHITE. 

Courtesy of Wikipedia 

^^From Presburger arithmetic, a decidable theory. 
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Furthermore, (j) Ax = 0 Ax < q A datax ^ WHITE has no solution. We can thus 
conclude Vy.p < y < q ^ t[y] — WHITE. 

Thus, we encountered a case of “spurious” solutions in the abstract element, 
due to the fact that the abstraction is not onto and that certain abstract ele¬ 
ments can be reduced to a smaller element with the same concretization; which 
was achieved through quantification (see Isubsection 3.3|) . This reduction can 
thus be performed through some form of quantifier elimination. 


6 Related work 


Acceleration For certain classes of loops, it is possible to compute exactly 
the transitive closure of the relation r encoding the semantics of the loop, 
within a decidable class. Acceleration for arrays has been studied by Bozga et al. 
ll| , who obtain the transitive closure in the form of a counter automaton. The 
translation from counter automaton to array properties expressed in first-order 
logic then requires an abstraction step, resulting in a loss of precision. Alberti 
et al. SQ proposed a template-based solution. Certain classes of r’s admit 
a definable acceleration in Presburger arithmetic augmented with free function 
symbols, at the price of nested quantifiers. The 3*V* fragment of this theory 
is undecidable [ 23 ; thus again abstraction is needed to apply this technique in 
practice. Yet, there are cases where exact acceleration is possible Q. Contrary 
to these approaches, i) ours does not put restrictions on the shape of the loop 
(and the program in general) ii) we perform the tunable abstraction first, with 
the rest of the analysis being delegated to a back-end (which can possibly use 
exact acceleration on scalar programs i). 


Abstract interpretation Various array abstractions |22|, [23, [39|, l40|, L16| dis¬ 


tinguish slices or segments, whose contents is then abstracted by another ab¬ 
stract domain. Depending on the approach, relationships between several slices 
may or may not be expressed, and the partitioning may be syntactic or based 
on some pre-analysis. To our best knowledge, none of these approaches work 
on multidimensional arrays or on maps, contrary to ours. One major difference 
between these approaches and ours is that ours separates the analysis, both in 
theory and implementation, into an abstraction that maps array programs to 
scalar programs and an analysis for the scalar programs, while theirs are more 
“monolithic”. Even though they are parametric in abstract domains for values 
and possibly indexes, they must be used inside an abstract interpreter based on 
Kleene iterations with widening. In contrast, ours can use any back-end analysis 
for scalar programs, including exact acceleration, abstract interpretation with 
Kleene iterations, policy iteration, and even, if a target property is supplied, 
predicate abstraction (see CEGAR below). 


Cox et al. 18| do not target array programs per se, but programs in highly 


dynamic object-oriented languages such as Javascript, where an object is a map 
from fields to values and the set of possible field names is not fixed. Dillig et 
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al. [ 2 ^ overcome the dichotomy of strong vs weak updates with liquid updates. 
Their approach is monolithic and cannot express properties such as sortedness. 


Predicate abstraction and CEGAR Predicate abstraction starts from the 
control structure of a program and incrementally refines it by splitting control 
states according to predicates chosen by the user [2lj or, commonly, obtained 
by counterexample-guided abstraction refinement (CEGAR). From an abstract 
counterexample trace not corresponding to a concrete counterexample, they re¬ 
fine the model using local predicates constituting a step-by-step proof that this 
abstract trace does not match any concrete trace. The hope is that this proof 
generalizes to more counterexample traces and that the predicates eventually 
converge to define an inductive invariant. The predicates are obtained from 
Craig interpolants 3^ 3^ 351 extracted from the proof of unsatisfiability pro¬ 
duced by a satisfiability modulo theory (SMT) solver. The difficulty here is to 
generate Craig interpolants that tend to generalize to inductive invariants, on 
quantified formulas involving arrays [s^ . We are interested in predicates such as 
VO < fc < f, t[k] = 0, which generalizes to an inductive invariant on Program[Tl 
as opposed to, say, t[0] = 0 A t[l] = 0, which is equivalent for f = 2 but does not 
generalize to arbitrary i. In order to achieve practical scalability, some work 
restrict themselves to the inference of array predicates to certain forms, e.g. 
range predicates [3l| . Others tune the interpolating procedure towards the gen¬ 
eration of better interpolants 0,0. A major difference between our approach 
and those based on CEGAR is that we do not require a “target” property to 
prove, which is necessary for having counterexamples, though we can use one if 
needed. If such a property is provided, our approach can use as a back-end a 
CEGAR system limited to scalar variables. 


Theorem proving and SMT-based approaches The generation of invari¬ 
ants for progra ms with arrays has been also studied using automated theorem 
proving [261127| | ; this approach is generally limited by the fact that theory rea¬ 
soning (e.g. arithmetic) and superposition-based deductive reasoning (on which 
the Vampire first-order theorem prover is based 32|) are not yet efficiently inte¬ 
grated. As opposed to 0, we do not rely on quantifier-instantiation procedures. 


Quantification Elanagan et al. 2l| also use Skolem constants that they 
quantify universally after analysis steps. As opposed to us, they require the 
user to specify the predicates on which the program will be abstracted. 


Abstraction of sets of maps Our approach generalizes a classical abstrac¬ 
tion of sets of maps 1^, §2.1]. Jeannet et al. [29j considered the problem of 
abstracting sets of functions of signature Di D 2 , assuming a finite abstract 
domain Ai of cardinality n abstracting subsets of Di and an abstract domain 
A 2 abstracting subsets of contrast, we do not make any cardinality 

assumption. 
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Partitioning Rival et al. introduced partitioning according to an abstrac¬ 
tion of the history of the computation. Our approach using observer variables 
for using ConcurInterproc l|subsection 5.41) is akin to considering a finite 
abstraction of the trace of read/writes into a given array. 


7 Conclusion and Future Work 

We have shown that a number of properties of array programs can be proved by 
abstracting the array a using a few symbolic cells a[x\,a[y\,... by automatically 
translating the program into a scalar program, running a static analyzer over the 
scalar program and translating back the invariant for the original program. In 
some cases, a form of quantifier elimination is used over the resulting formulas. 

Our approach is not specific to arrays, and can be applied to any map struc¬ 
ture X ^ Y (e.g. hash tables and other container classes). A possible future 
extension is multiset properties, a multiset being map A —>■ N. 

The main weakness of our approach is the need for a rather precise back¬ 
end analysis (for the scalar program obtained by translation). Our experiments 
highlighted some inefficiencies in e.g. Flata and ConcurInterproC: in the 
former, many paths can be enumerated and complicated formulas generated 
even though a much simpler equivalent form exists; in the latter, polyhedra that 
are only slightly different (say, one constraint is different) are handled wholly 
separately. This gives immediate directions for research for improving exact 
acceleration, as in Flata, or disjunctions of polyhedra, as in ConcurInter¬ 
proc. Another difficulty, if using ConcurInterproc or other tools focusing 
on convex sets of integer vectors, is the need to use observer variables and/or 
an auxiliary pre-analysis to “focus” the main analysis. 

We stress again that we obtained our results using unmodified versions of 
very different back-end analyzers (ConcurInterproc, Flata, CPAChecker), 
which testihes to the flexibility of our approach. Performance and precision im¬ 
provements can be expected by modifying the back-end analyzers (e.g. precision 
could be improved by performing reduction steps during the analysis, rather 
than after the computation of the invariants). 
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A Proofs 

Lemma 1. The forward and backward semantics of Program abstract the 
forward and backwards semantics ofr=f[i]; by the (af, 7 f) Galois connection. 

Proof. Consider an abstraction C 5" x R x A x (A x B) of {s,r,i,f): Va € 

A (s,r,i, a, f[aj) € x^. The image of the set by that program is y^ = 

{(s, r', i, a, b) \ r' G BAi ^ aA(s, r, i, a, 6 ) € x*'}U{(s, 6 , i, i, b) \ (s, r, i, i, b) G x'^}. 
It is clear that {s, f{i),i, f) G 7 (y^), otherwise said Va G A (s, f(i), i, a, f[a]) G 
yf 

The pre-image of the set x*' by that program is z*' = {(s, r, z,a, 6 ) | r G 
BAi ^ aA{s,r', i, a, b) G x*'} U {{s,r,i,i,b) \ r G BA (s, b, i, i, b) G x^}. Assume 

{s,r',i,f) G 7 (x'’) and {s,r,i,f) (s,r',i,/); then • either r' ^ f{i): 

then there is no such {s,r,i,f), thus any such {s,r,i,f) G 7 (z*')Th • either 
r' = f(i), then any (s, r, i, f) fits; let us now prove (s, r, i, /) G 7 ( 2 '^): let a G A, 
then either i = a and {s,r,i,a, f[a]) G (second disjunct), ov i ^ a and 
{s,r,i,a, f[a]) G (first disjunct). □ 

Lemma 2. The forward and backward semantics of Program abstract the 
forward and backwards semantics off[i]=r; by the (af, 7 f) Galois connection. 

Proof. Consider an abstraction x'’ C R x R x A x (A x R) of {s,r,i,f ): Va G 

A (s,r,i, a, f[aj) G x^. The image of the set x^ by that program is y^ = 

{(s,r, *,a, 6 ) I i a A {s,r,i,a,b) G x'^} U {(s, r, *, r) | {s,r,i,a,b) G x^}. 
Let us prove that {s,r,i,f[i r-]) G 7 ( 1 /^). Let a G A. If a ^ then 
{s,r,i,a, f[i >-A r]{a)) = {s,r,i,a, f{a)) G y^ (first disjunct); if a = f, then 
(s, r, i, a, f[i 1 —>■ x](a)) = (s, r, i, i, r) G y^ (second disjunct). 


abstract do- 
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The pre-image of the set by that program is = {{s,r,i,i,b' \ b' G 
B A {s,r,i,i,b) € U {(s, r, i, a, 6 ) | i ^ a A {s,r,i,a,b) G x'^}. Assume 

{s,r,i,f') G 7 (a;^) and {s,r,i,f) {s,r,i,f); let us prove (s,r,i,f) G 

7 (z^). Let a G A. li a = i, then (s, r, i, i, f{i)) G (first disjunct) If a 7 ^ i, then 


(s, r, i, a, f{a)) = (s, r, i, a, f'{a)) G (second disjunct). □ 

Theorem 1. Consider a loop-free array program P with arrays ai,..., such 
that the number of occesses to these arrays are respectively ai,... ,ad- By ab¬ 
stracting these arrays with, respectively, ni,... ,nd indices such that Ui > Ui for 
all i, we obtain a Galois connection L' such that 715 o 7 o o a = 715 o 
where ns is the projection of the state to the scalar variables. 


Proof Consider an execution trace T in P, and record the indices of the j-th 
(numbered syntactically) access to the t-th array. Consider now the program 
P' obtained by abstracting P according to ai indices for each array Oj, i.e. each 
read r := 07 [e] is transformed into 

r = random () ; 

if (e==Xi,i) { assume (r==&i,i); } if (e==Xi, 2 ) { assume (r== 6 i ,2 ); } 
and each write 07 [e] := w as 

if (e==a;i,i) { = 10 ; } if (e==a;i, 2 ) { &i .2 = w; } ... 

Now replay T in P', with the same initial values, the same external and 
nondeterministic choices, and xi^ = fij. Then, for any array access in the 
execution of P', at least one of the tests is taken (the program does not fall into 
the case where none of the selected indices match the index for the read/write 
instruction). In the case of a read r := 07 [e], the value read in P' is then the 
same as the one read in P. Then, the execution of P' faithfully mimics that of 
P. The hnal values for the execution of T in P' are thus the same as those in 
P, which proves the statement. □ 
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